Parity and Predictability of Competitions 
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We present an extensive statistical analysis of the results of all sports competitions in five major 
sports leagues in England and the United States. We characterize the parity among teams by the 
variance in the winning fraction from season-end standings data and quantify the predictability of 
games by the frequency of upsets from game results data. We introduce a novel mathematical model 
in which the underdog team wins with a fixed upset probability. This model quantitatively relates 
the parity among teams with the predictability of the games, and it can be used to estimate the 
upset frequency from standings data. 



What is the most competitive sports league? We 
answer this question via an extensive statistical survey 
of game results in five major sports. Previous stud- 
ies have separately characterized parity (Fort 1995) and 
predictability (Stern 1997, Wesson 2002, Lundh 2006) of 
sports competitions. In this investigation, we relate par- 
ity with predictability using a novel theoretical model in 
which the underdog wins with a fixed upset probability. 
Our results provide further evidence that the likelihood 
of upsets is a useful measure of competitiveness in a given 
sport (Wesson 2002, Lundh 2006). This characterization 
complements the myriad of available statistics on the out- 
comes of sports events (Albert 2005, Stern 1991, Gembris 
2002). 

We studied the results of nearly all regular sea- 
son competitions in 5 major professional sports leagues 
in England and the United States (table I): the pre- 
mier soccer league of the English Football Associa- 
tion (FA), Major League Baseball (MLB), the Na- 
tional Hockey League (NHL), the National Basketball 
Association (NBA), and the National Football League 
(NFL). NFL data includes the short-lived AFL. In- 
complete seasons, such as the quickly abandoned 1939 
FA season, and nineteenth-century results for the Na- 
tional League in baseball were not included. In to- 
tal, we analyzed more than 300,000 games in over 
a century (data source: http://www.shrpsports.com/ 
http://www.the-english-football-archive.com/). 



I. QUANTIFYING PARITY 

The winning fraction, the ratio of wins to total games, 
quantifies team strength. Thus, the distribution of win- 
ning fraction measures the parity between teams in a 
league. We computed F(x), the fraction of teams with a 
winning fraction of x or lower at the end of the season, 
as well as a = y/ (x 2 ) — (x) 2 , the standard deviation in 



FIG. 1: Winning fraction distribution (curves) and the best- 
fit distributions from simulations of our model (circles). For 
clarity, FA, that lies between MLB and NHL, is not displayed. 



winning fraction. Here (■) denotes the average over all 
teams and all years using season-end standings. In our 
definition, a gives a quantitative measure for parity in a 
league (Fort 1995, Gould 1996). For example, in base- 
ball, where the winning fraction x typically falls between 
0.400 and 0.600, the variance is a = 0.084. As shown in 
figures 1 and 2a, the winning fraction distribution clearly 
distinguishes the five leagues. It is narrowest for baseball 
and widest for football. 

Do these results imply that MLB games are the most 
competitive and NFL games the least? Not necessarily! 
The length of the season is a significant factor in the vari- 
ability in the winning fraction. In a scenario where the 
outcome of a game is random, i.e., either team can win 
with equal probability, the total number of wins performs 
a simple random walk, and the standard deviation a is 
inversely proportional to the square root of the number 
of games played. Generally, the shorter the season, the 
larger a. Thus, the small number of games is partially 
responsible for the large variability observed in the NFL. 



'Electronic address: ebn@lanl.gov 
^Electronic address: fvazquez@buphy.bu.edu 
t Electronic address: redner@bu.edu 



II. QUANTIFYING PREDICTABILITY 

To account for the varying season length and reveal the 
true nature of the sport, we set up artificial sports leagues 
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league 


years 


games 


(games) 


a 


Q 


Qmodcl 


FA 


1888-2005 


43350 


39.7 


0.102 


0.452 


0.459 


MLB 


1901-2005 


163720 


155.5 


0.084 


0.441 


0.413 


NHL 


1917-2004 


39563 


70.8 


0.120 


0.414 


0.383 


NBA 


1946-2005 


43254 


79.1 


0.150 


0.365 


0.316 


NFL 


1922-2004 


11770 


14.0 


0.210 


0.364 


0.309 



TABLE I: Summary of the sports statistics data. Listed are 
the time periods, total number of games, average number of 
games played by a team in a season ({games}), variance in 
the win-percentage distribution (<r), measured frequency of 
upsets (q) , and upset probability obtained using the theoreti- 
cal model (g mo doi). The fraction of ties in soccer, hockey, and 
football is 0.246, 0.144, and 0.016, respectively. 

where teams, paired at random, play a fixed number of 
games. In this simulation model, the team with the bet- 
ter record is considered as the favorite and the team with 
the worse record is considered as the underdog. The out- 
come of a game depends on the relative team strengths: 
with "upset probability" q < 1/2, the underdog wins, 
but otherwise, the favorite wins. If the two teams have 
the same fraction of wins, one is randomly selected as the 
winner. 

We note that a similar methodology was utilized by 
Wesson who focused on the upset likelihood as a func- 
tion of the final point spread in soccer (Wesson 2002). 
Also, an equivalent definition of the upset frequency was 
very recently employed by Lundh to characterize how 
competitive tournaments are in a variety of team sports 
(Lundh 2006). 

Our analysis of the nonlinear master equations that de- 
scribe the evolution of the distribution of team win/loss 
records shows that a decreases both as the season length 
increases and as games become more competitive, i.e., 
as q increases. This theory is described in the appendix 
and more generally in Ben-Nairn et al. 2006. The basic 
quantity to characterize team win/loss records is F(x), 
the fraction of teams that have a winning fraction that 
is less than or equal to x. In a hypothetical season with 
an infinite number of games, the winning fraction distri- 
bution is uniform 

(0 < x < q 

T~^tq q<x<i-q (1) 
1 1 - q < x. 

From the definition of the upset probability, the lowest 
winning fraction must equal q, while the largest winning 
fraction must be 1 — q. 

By straightforward calculation from F(x), the stan- 
dard deviation a is a linear function of the upset proba- 
bility 



Thus, the larger the probability that the stronger team 



0.28 




year 

FIG. 2: (a) The cumulative variance in the winning fraction 
distribution (for all seasons up to a given year) versus time, 
(b) The cumulative frequency of upsets q, measured directly 
from game results, versus time. 



wins, the greater the disparity between teams. Perfect 
parity is achieved when q = 1/2, where the outcome 
of a game is completely random. However, for a finite 
and realistic number of games per season, such as those 
that occur in sports leagues, we find that the variance 
is larger than the infinite game limit given in Eq. J2J. 
As a function of the number of games, the variance de- 
creases monotonically, and it ultimately reaches the lim- 
iting value (J5J). 

We run numerical simulations of these artificial sports 
leagues by simply following the rules of our theoretical 
model. In a simulated game, the records of each team 
are updated according to the following rule: if the two 
teams have a different fraction of wins, the favorite wins 
with probability 1 — q and the underdog wins with prob- 
ability q. If the two teams are equal in strength, the 
winner is chosen at random. Using the simulations, we 
determined the value of q mo dei that gives the best match 
between the distribution F(x) from the simulations to 
the actual sports statistics (figure 1). Generally, there is 
good agreement between the simulations results and the 
data, as quantified by g m odoi (table I). 

To characterize the predictability of games di- 
rectly from the game results data, we followed the 
chronologically-ordered results of all games and recon- 
structed the league standings at any given day. We then 
measured the upset frequency q by counting the fraction 
of times that the team with the worse record on the game 
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date actually won (table I). Games between teams with 
no record (start of a season) or teams with equal records 
were disregarded. Game location was ignored and so was 
the margin of victory. In soccer, hockey, and football, 
ties were counted as 1/2 of a victory for the underdog 
and 1/2 of a victory for the favorite. We verified that 
this definition did not have a significant affect on the 
results. The upset probability changes by at most 0.02 
(and typically, much less) if ties are ignored altogether. 
We note that to generalize our model to situations with 
ties, it is straightforward to add a second parameter, the 
probability of a tie, into the model definition. 

Our main result is that soccer and baseball are the 
most competitive sports with q = 0.452 and q — 0.441, 
respectively, while basketball and football, with nearly 
identical q = 0.365 and q = 0.364, are the least (Stern 
1997, Stern 1998). 

There is also good agreement between the upset prob- 
ability g m odci, obtained by fitting the winning fraction 
distribution from numerical simulations of our model to 
the data as in figure 1, and the measured upset frequency 
(table I). We notice however a systematic bias in the 
estimated upset frequencies: the discrepancy between q 
and g m odei grows as the games become less competitive. 
Consistent with our theory, the variance a mirrors the 
bias, 1/2 — q (figures 2a and 2b). Tracking the evolu- 
tion of either q or a leads to the same conclusions: (1) 
MLB games have been steadily becoming more compet- 
itive (Gould 1996), (2) NFL has dramatically improved 
the competitiveness of its games over the past 40 years, 
and (3) over the past 60 years, FA displays an opposite 
trend with the games becoming less competitive. 



III. ALL-TIME TEAM RECORDS 

In our theory, both the season length and the upset 
probability affect the broadness of the win fraction dis- 
tribution. However, in a hypothetical season with an 
infinite number of games, the distribution is governed by 
the upset probability alone. In this case, the bias 1/2 — q 
and the variance a are equivalent measures of the com- 
petitiveness, as indicated by 

The all-time records of teams provide the longest pos- 
sible win-loss record. This comparison is of course lim- 
ited by the small number of teams, that varies between 
26 and 37 (we ignored defunct franchises and franchises 
participating in less than 10 seasons), and the significant 
variations in the total number of games played by the 
teams. Interestingly, F(x) obtained from the all-time 
win-loss records is reasonably close to the uniform dis- 
tribution suggested by the theory (Fig. [21 and Table IXTfl . 
The slope of the line in figure [3] was obtained using the 
theory: the upset probability g a n was estimated from the 
observed variance cr a n using Eq. (0) . This provides addi- 
tional support for the theoretical model. 

Overall, the win fraction distribution for the team all- 
time winning record is in line with the rest of our find- 
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FIG. 3: The all-time cumulative win-fraction distribution for 
active teams with 10 season minimum. For clarity, FA and 
NBA data is not displayed. The theoretical curves for an 
infinite season using q^n obtained by substituting cr a n into @ 
are shown for reference (table II). 



League 


Teams 


(Games) 


Call 


9all 


^max 


2-min 


FA 


37 


2154 


0.035 


0.439 


0.582(LPL) 


0.406(CP) 


MLB 


26 


13100 


0.024 


0.458 


0.567(NYY) 


0.459(SDP) 


NBL 


26 


2850 


0.044 


0.424 


0.589(MC) 


0.403(TBL) 


NBA 


27 


3060 


0.057 


0.401 


0.616(LAL) 


0.358(LAC) 


NFL 


31 


720 


0.057 


0.401 


0.600(MD) 


0.358(TBB) 



TABLE II: Summary of the sports statistics data presented 
in Figure H The average number of games played by teams 
since their inception is denoted by (Games). The quantity 
(T a n is the variance in the all-time winning percentage of the 
roughly 30 sports clubs. The maximal and minimal fraction 
of wins for individual teams are indicated by x max and £ m in, 
respectively. The team acronyms are: (LPL) Liverpool, (CP) 
Crystal Palace, (NYY) New York Yankees, (SDP) San Diego 
Padres, (MC) Montreal Canadiens, (TBL) Tampa Bay Light- 
ning, (LAL) Los Angeles Lakers, (LAC) Los Angeles Clippers, 
(MD) Miami Dolphins, (TBB) Tampa Bay Buccaneers. 



ings: soccer and baseball are the most competitive sports 
while basketball and football are the least. We note that 
the win fraction distribution is extremely narrow, and 
the closest to a straight line, for baseball because of the 
huge number of games. Even though the total number 
of games in basketball is four times that of football, the 
two distributions have comparable widths. The fact that 
similar trends for the upset frequency emerge from game 
records as do from all-time team records indicate that the 
relative strengths of clubs have not changed considerably 
over the past century. 



IV. DISCUSSION 

In summary, we propose a single quantity, q, the 
frequency of upsets, as an index for quantifying the 
predictability, and hence the competitiveness of sports 
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leagues. This quantity complements existing methods 
addressing varying length seasons and in particular, com- 
petitive balance that is based on standard deviations in 
winning percentages (Fort 2003). We demonstrated the 
utility of this measure via a comparative analysis that 
shows that soccer and baseball are the most competitive 
sports. Trends in this measure may reflect the gradual 
evolution of the teams in response to competitive pres- 
sure (Gould 1996, Lieberman 2005), as well as changes 
in game strategy or rules (Hofbauer 1998). What plays 
the role of fitness in this context is in open question. 

In our definition of the upset frequency we ignored is- 
sues associated with unbalanced schedules, unestablishcd 
records, and variations in the team strengths. For exam- 
ple, we count a game in which a 49-50 team beats a 50-49 
team as an upset. To assess the importance of this ef- 
fect we ignored all games between teams separated by 
less than 0.05 in win-percentage. We find that the up- 
set frequency changes by less than 0.005 on average for 
the five sports. Also, one may argue that team records 
in the beginning of the season are not well established 
and that there are large variations in schedule strength. 
To quantify this effect, we ignored the first half of the 
season. Remarkably, this changes the upset frequency by 
less than 0.007 on average. We conclude that issues asso- 
ciated with strength of schedule and unbalanced sched- 
ules have negligible influence on the upset frequency. 

It is worth mentioning that our model does not ac- 
count for several important aspects of real sports com- 
petitions. Among the plethora of such issues, we list a 
few prominent examples: (i) Game location. Home and 
away games are not incorporated into our model, but 
game location does affect the outcome of games. For ex- 
ample, during the 2005 baseball season 54% of the total 
team wins occurred at home, (ii) Unbalanced schedule. 
In our fixed-game algorithm, each team plays all other 
teams the same number of times. However, some sports 
leagues are partitioned into much smaller subdivisions, 
with teams playing a larger fraction of their schedule 
against teams in their own subgroup. This partitioning 
is effectively the same as reducing the number of teams, 
an effect that we found has a small influence on the dis- 
tribution of win fraction, (iii) Variable upset probability. 
It is plausible that the upset probability q depends on 
the relative strengths of the two competing teams. It is 
straightforward to generalize the model such that the up- 
set frequency depends on the relative strengths of the two 
teams and this may be especially relevant for tournament 
competitions. 

Despite all of these simplifying assumptions, we see the 
strength of our approach in its simplicity. Our theoreti- 
cal model involves a single parameter and consequently, 
it enables direct and unambiguous quantitative relation 
between parity and predictability. 

Our model, in which the stronger team is favored to 
win a game, enables us to take into account the varying 
season length and this model directly relates parity, as 
measured by the variance a with predictability, as mea- 



sured by the upset likelihood q. This connection has 
practical utility as it allows one to conveniently estimate 
the likelihood of upsets from the more easily-accessible 
standings data. In our theory, all teams are equal at the 
start of the season, but by chance, some end up strong 
and some weak. Our idealized model does not include the 
notion of innate team strength; nevertheless, the spon- 
taneous emergence of disparate-strength teams provides 
the crucial mechanism needed for quantitative modeling 
of the complex dynamics of sports competitions. 

One may speculate on the changes in competitiveness 
over the years. In football there is a dramatic improve- 
ment in competitiveness indicating that actions taken by 
the league including revenue sharing, the draft, and un- 
balanced schedules with stronger teams playing a tougher 
schedule are very effective. In baseball, arguably the 
most stable sport, the gentle improvement in competi- 
tiveness may indeed reflect natural evolutionary trends. 
In soccer, the decrease in competitiveness over the past 
60 years indicate a "rich gets richer" scenario. 
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APPENDIX A: THE THEORETICAL MODEL 

In our model, there are N teams that compete against 
each other. In each game there is one winner and one 
loser, and no ties are possible. In each competition, the 
team with the larger number of wins is considered as 
the favorite, and the other team as the underdog. The 
winner of each competition is determined by the following 
rule: the underdog wins with upset probability q, and the 
favorite team wins with probability p — 1 — q. If the two 
competing teams have identical records, the winner is 
chosen randomly. 

Let k be the number of wins of a team. Then the 
outcome of a game is as follows: when k > j 

(k,j) — ► (k,j + l) with probability q, 
(k,j) — ► (k + l,j) with probability 1 — q . 

Our theoretical analysis is based on a kinetic approach. 
We set the competition rate to 1/2, so that the time 
increases by one when every team plays one game, on 
average. Then the average number of games played by a 
team, t, plays the role of time. Also, we take the limit 
of large t so that fluctuations in the number of games 
vanish. 

Let gk (t) be the fraction of teams with k wins at time 
t. We address the case where any two teams are equally 
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FIG. 4: The win-fraction distribution for q = 1/4 at different 
times t = 100 and t = 500. 

likely to play against each other. Then, the win-number 
distribution obeys the master equation (Ben-Nairn 2006) 



^2, k k{G k — Gk-i), obeys d(k)/dt = 1/2; therefore, the 
average number of wins by a team is half the number of 
games it plays, (k) = t/2, as it should. 

When the number of games is large, t — > oo, we can 
solve the master equation using a simple scaling analysis. 
Let us take the continuum limit of the master equation 
by replacing differences with derivatives, Gt+i — Gk — > 
dG/dk. To first order in this "spatial" derivative, we 
obtain the nonlinear partial differential equation 

— + [ q + (l-2q)G\ — = 0. (A3) 

Since the number of wins is proportional to the number 
of games played, k ~ t, we focus on the fraction of wins 
x = k/t. The corresponding win- fraction distribution 

G k (t) -> F(k/t) (A4) 



-fr = i 1 - 0){9k-iGk-i ~ 9kG k ) 
dt ^ ( A1 ) 

+ q{gk-iH k -i - 9kH k ) + - (gl_ 1 - g 2 k ) . 

Here G k = J2jZo 9j and H k = YlJLk+i 9j are the re- 
spective cumulative distributions of teams with less then 
or more than k wins. Of course G k + H k -\ = 1. The 
boundary condition is g~i{t) = 0. The first pair of terms 
describes games where the stronger team wins, and the 
second pair of terms accounts for interactions where the 
weaker team wins. The last pair of terms describes games 
between two equal teams. The pref actor 1/2 arises be- 
cause there are half as many ways to chose equal teams 
as there are for different teams. We consider the initial 
condition where all teams are equal, g k {0) = <5fc,o- 

By summing the rate equation (|A1|) . the cumulative 
distribution obeys the master equation 

^ = q(G k -i G k ) + (1/2 - q) (Gti - Gl) . (A2) 

The boundary conditions are Go = 0, Goo = 1, while the 
initial condition for the start of each season is G k (0) = 1 
for k > 0. It is simple to verify, by summing the mas- 
ter equations, that the average number of wins (k) — 



becomes stationary in the long-time limit, t — > oo. The 
boundary conditions for the win-fraction distribution is 
F(0) = and F(l) = 1. 

Substituting the scaled cumulative win-fraction dis- 
tribution i|A4|) into the continuum equation (|A3|) . we 
find that the scaled cumulative win-fraction distribution 
obeys the ordinary differential equation 

dF 

[(a - q) - (1 - 2q)F(x)} — = 0. (A5) 

Here the prime denotes differentiation with respect to x. 
The solution is either a constant F[x) = constant, or the 
linear function F(x) — . Using these two solutions, 
invoking the boundary conditions -F(O) = and F(l) — 
1, as well as continuity of the cumulative distribution, 
we deduce that the winning fraction has the form that is 
given in equation Q. In a hypothetical season with an 
infinite number of games, the win- fraction distribution 
f(x) = F'(x) is uniform, f(x) = (1 — 2q)~ 1 , in the range 
1 < x < 1 — q, while f{x) vanishes outside this range. 
As shown in figure 4, numerical integration of the master 
equation i|A2() confirms the scaling behavior JIJ: as the 
number of games increases, the win-fraction distribution 
approaches the limiting uniform distribution. 
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